Machine learning-derived universal connector

ABSTRACT

In a computer-implemented method for endpoint management, a plurality of messages communicated between a target endpoint and a client are recorded, in a computer-readable memory. Ones of the messages are clustered into respective groups, where the respective groups correspond to respective operation types of the ones of the messages included therein. For the respective operation types, respective message structures used by the target endpoint are determined based on commonalities among the ones of the messages of the respective groups corresponding to the operation types. For one of the respective operation types, a request to the target endpoint is generated in accordance with a corresponding one of the respective message structures used by the target endpoint. Related computer systems and computer program products are also discussed.

BACKGROUND

Various embodiments described herein relate to computer systems, methodsand program products and, more particularly, to virtualized computersystems, methods and computer program products.

Modern enterprise software environments may integrate a large number ofsoftware systems to facilitate complex business processes. Many of thesesoftware systems may interact with and/or rely on services provided byother systems (e.g., third-party systems or services) in order toperform their functionalities or otherwise fulfill theirresponsibilities, and thus, can be referred to as “systems of systems.”For example, some enterprise-grade identity management suites maysupport management and provisioning of users, identities, and roles inlarge organizations across a spectrum of different endpoint systems.Such systems can be deployed into large organizations or corporations,such as banks and telecommunications providers, and may be used tomanage the digital identities of personnel and to control access oftheir vast and distributed computational resources and services.

In particular, identity management products can automate the process ofgranting and verifying application access based on each user'srelationship and role with the organization (including but not limitedto employees, administrators, contractors, customers or businesspartners), which can improve information technology (IT) flexibilityand/or operational efficiencies. Such identity management products canalso reduce security risks by on-boarding new users faster, and/or byensuring users are only granted access that is appropriate to theirfunction.

As noted above, one use case for identity management products is theability to provision access to disparate endpoint systems. For example,an identity may be associated with a single authoritative user stored ina corporate store (such as Microsoft Active Directory), but may beassociated with numerous accounts in other managed endpoints such asSAP®, PeopleSoft®, Google Apps®, etc. Provisioning to these endpoints istypically the responsibility of a network component referred to as theConnector Server (CS). The Connector Server may utilize a number ofconnectors, each of which may be responsible for a different endpointtype. These connectors can thus act as a bridge to convert requests anddata from a common format used within the Connector Server into thespecific protocol(s) or client libraries used within or otherwiseunderstandable by the endpoint.

BRIEF SUMMARY

According to some embodiments, in a computer-implemented method forendpoint management, a plurality of messages communicated between atarget endpoint and a client are recorded, in a computer-readablememory. Ones of the messages are clustered into respective groups, wherethe respective groups correspond to respective operation types of theones of the messages included therein. For the respective operationtypes, respective message structures used by the target endpoint aredetermined based on commonalities among the ones of the messages of therespective groups corresponding to the operation types. For one of therespective operation types, a request to the target endpoint isgenerated in accordance with a corresponding one of the respectivemessage structures used by the target endpoint. The recording, theclustering, the determining, and the generating comprise operationsperformed by a processor.

In some embodiments, in determining the respective message structuresfor the respective operation types, constant and variable sections ofthe respective message structures may be identified for the respectiveoperation types. The constant and variable sections of the respectivemessage structures may be identified based on the commonalities amongthe ones of the messages of the respective groups corresponding to therespective operation types, and without knowledge or otherwiseindependent of a communications protocol used by the target endpoint.

In some embodiments, the ones of the messages of the respective groupsmay include requests to the target endpoint, and/or requests to thetarget endpoint and responses to the requests. In identifying theconstant and variable sections of the respective message structures forthe respective operation types, the requests of the respective groupsmay be aligned according to respective positions thereof, and theconstant and variable sections of the respective message structures forthe respective operation types may be identified based on a frequency ofoccurrence of the commonalities at the respective positions of therequests of the respective groups corresponding thereto indicated by thealigning.

In some embodiments, the commonalities may be common characters that arepresent at the respective positions of ones of the requests of therespective groups. Based on the respective message structures for therespective operation types, respective request prototypes may be createdfor the respective operation types. The request prototypes may includefields representing the constant and variable sections of the respectivemessage structures used by the target endpoint for the respectiveoperation types. The fields representing the constant sections of therespective request prototypes may include corresponding ones of thecommon characters.

In some embodiments, in generating the request to the target endpointfor the one of the respective operation types, one of the requestprototypes corresponding to the one of the operation types may beselected. For the one of the request prototypes that was selected, onesof the fields representing the variable sections may be populated withdata from an external database to generate the request to the targetendpoint.

In some embodiments, the data may correspond to a user identity in asystem employing a message structure different from the respectivemessage structures used by the target endpoint. The request mayassociate the user identity with an account of the target the endpoint.For example, the request may be used to provision a user identity on afirst system for use on the target endpoint.

In some embodiments, the request to the target endpoint may be generatedwithout or otherwise independent of a connector component that isspecific to the target endpoint and/or a communications protocol usedthereby. In some embodiments, the communications protocol used by thetarget endpoint may not be a generic protocol.

In some embodiments, the ones of the messages may be clustered into therespective groups responsive to receiving user input indicating therespective operation types of the ones of the messages.

In some embodiments, the ones of the messages may be clustered into therespective groups based on similarities therebetween calculated using adistance function, and then the respective groups may be classified ascorresponding to the respective operation types responsive to receivinguser input indicative of the correspondence between the respectivegroups and the respective operation types.

According to further embodiments, a computer system includes aprocessor, and a memory coupled to the processor. The memory includescomputer readable program code embodied therein that, when executed bythe processor, causes the processor to record, in a computer-readablememory, a plurality of messages communicated between a target endpointand a client, and cluster ones of the messages into respective groups.The respective groups correspond to respective operation types of theones of the messages included in the respective groups. The memoryfurther includes computer readable program code embodied therein that,when executed by the processor, causes the processor to determine, forthe respective operation types, respective message structures used bythe target endpoint based on commonalities among the ones of themessages of the respective groups corresponding to the respectiveoperation types, and generate, for one of the respective operationtypes, a request to the target endpoint in accordance with acorresponding one of the respective message structures used by thetarget endpoint.

According to still further embodiments, a computer program productincludes a computer readable storage medium having computer readableprogram code embodied in the medium. The computer readable program codeincludes computer readable code to record, in a computer-readablememory, a plurality of messages communicated between a target endpointand a client, and computer readable code to cluster ones of the messagesinto respective groups. The respective groups correspond to respectiveoperation types of the ones of the messages included therein. Thecomputer readable program code further includes computer readable codeto determine, for the respective operation types, respective messagestructures used by the target endpoint based on commonalities among theones of the messages of the respective groups corresponding to therespective operation types, and computer readable code to generate, forone of the respective operation types, a request to the target endpointin accordance with a corresponding one of the respective messagestructures used by the target endpoint.

It is noted that aspects described herein with respect to one embodimentmay be incorporated in different embodiments although not specificallydescribed relative thereto. That is, all embodiments and/or features ofany embodiments can be combined in any way and/or combination. Moreover,other systems, methods, and/or computer program products according toembodiments will be or become apparent to one with skill in the art uponreview of the following drawings and detailed description. It isintended that all such additional systems, methods, and/or computerprogram products be included within this description, be within thescope of the present disclosure, and be protected by the accompanyingclaims.

BRIEF DESCRIPTION OF THE DRAWINGS

Aspects of the present disclosure are illustrated by way of example andare not limited by the accompanying figures with like referencesindicating like elements.

FIG. 1 is a block diagram of a computing system or environment employinga universal connector for identity management in accordance with someembodiments of the present disclosure.

FIG. 2 is a block diagram that illustrates computing deviceimplementation of a universal connector in accordance with someembodiments of the present disclosure

FIG. 3 is a block diagram that illustrates a software/hardwarearchitecture for a universal connector in accordance with someembodiments of the present disclosure.

FIGS. 4A-4C are block diagrams illustrating operations performed by auniversal connector in accordance with some embodiments of the presentdisclosure.

FIGS. 5 and 6A-6B are flowcharts illustrating methods of operation of auniversal connector in accordance with some embodiments of the presentdisclosure.

DETAILED DESCRIPTION

As will be appreciated by one skilled in the art, aspects of the presentdisclosure may be illustrated and described herein in any of a number ofpatentable classes or context including any new and useful process,machine, manufacture, or composition of matter, or any new and usefulimprovement thereof. Accordingly, aspects of the present disclosure maybe implemented entirely hardware, entirely software (including firmware,resident software, micro-code, etc.) or combining software and hardwareimplementation that may all generally be referred to herein as a“circuit,” “module,” “component,” or “system.” Furthermore, aspects ofthe present disclosure may take the form of a computer program productembodied in one or more computer readable media having computer readableprogram code embodied thereon.

Any combination of one or more computer readable media may be utilized.The computer readable media may be a computer readable signal medium ora computer readable storage medium. A computer readable storage mediummay be, for example, but not limited to, an electronic, magnetic,optical, electromagnetic, or semiconductor system, apparatus, or device,or any suitable combination of the foregoing. More specific examples (anon-exhaustive list) of the computer readable storage medium wouldinclude the following: a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an appropriateoptical fiber with a repeater, a portable compact disc read-only memory(CD-ROM), an optical storage device, a magnetic storage device, or anysuitable combination of the foregoing. In the context of this document,a computer readable storage medium may be any tangible medium that cancontain, or store a program for use by or in connection with aninstruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device. Program codeembodied on a computer readable signal medium may be transmitted usingany appropriate medium, including but not limited to wireless, wireline,optical fiber cable, RF, etc., or any suitable combination of theforegoing.

As described herein, a computing system or environment may include oneor more hosts, operating systems, peripherals, and/or applications.Machines in a same computing system or environment may have sharedmemory or resources, may be associated with the same or differenthardware platforms, and/or may be located in the same or differentphysical locations. Computing systems/environments described herein mayrefer to a virtualized environment (such as a cloud environment) and/ora physical environment.

As used herein, an endpoint may refer to a specific installation of aplatform or application, such as Active Directory or Microsoft Exchange,which may communicate with an identity manager to synchronizeinformation. An endpoint is typically managed by a connector serverusing a specific connector, which may refer to software that enablescommunication between a server and an endpoint system. Connectors aretypically responsible for representing each of the object classes in anendpoint in a consistent manner. For example, connectors may translateadd, modify, delete, rename, and search LDAP operations on objects intocorresponding actions against the endpoint system.

Some embodiments of the present disclosure may arise from realizationthat a cost of supporting new managed endpoint types can includesignificant overhead in creating new connectors for the respectiveendpoint types. A connector may include at least two elements: theprotocol and the data schema. In many cases, providing the data schemafor the connector may not be problematic, as attributes are typicallydescribed in a data driven manner that can allow for mapping betweendifferent names/types using metadata at run time. Providing the protocolfor the connector, however, may typically require coding unless theendpoint supports a generic protocol such as LDAP (Lightweight DirectoryAccess Protocol) or SQL (Structured Query Language). Such additionalcoding can be prohibitively expensive and/or impractical in a timeconstrained situation, such as a sale's proof of concept (POC).

Accordingly, embodiments of the present disclosure can provide endpointmanagement systems, methods, and computer program products that allowfor the addition of new endpoint types without requiring additionalcoding operations and/or the use of endpoint-specific connectors orprotocols. In particular, some embodiments of the present disclosureprovide a “universal connector” that is configured to record and storenetwork traffic (including request messages and/or response messages;also referred to herein as message transactions) communicated between anexisting client and a target endpoint, classify the recorded messages byoperation type, and thus learn or otherwise determine message structuresof the communications protocol(s) used by the target endpoint based onthe classification of the recorded messages. In particular, once asufficiently large collection of messages are recorded, the messages canbe analyzed to determine commonalities in structure and/or format, whichmay be indicative of constant and variable segments or sections of themessage structures used by the target endpoint for each operation type.Once determined, the message structures can be used to produce templatesthat can be populated with mapped attributes to generate requests havingthe structure/format expected by the target endpoint.

Identity management in accordance with embodiments of the presentdisclosure may thus offer advantages in that additional codingoperations and/or the use of endpoint-specific protocols may not berequired to add new endpoint types. Furthermore, embodiments of thepresent disclosure may be language agnostic. For example, a universalconnector as described herein may be implemented as a Java application,even though an endpoint vendor may supply C/C++ client libraries.

FIG. 1 is a block diagram illustrating a computing system or environmentemploying a universal connector for protocol-independent endpointmanagement in accordance with some embodiments of the presentdisclosure. Referring now to FIG. 1, the environment 100 includes aplurality of endpoints 111A, 111B . . . 111N, at least one existingclient 105 of the endpoints 111A, 111B . . . 111N, and a universalconnector server 115. The endpoints 111A, 111B . . . 111N may beimplemented on one or more servers, and may provide one or more softwareservices upon which the existing client(s) 105 depend or otherwiseinteract to fulfill responsibilities. The environment 100 furtherincludes an identity manager application server 101 and an identity datastore 110.

The identity data store 110 may include user identity data for aplurality of users associated with a particular system or server, whichmay be managed by the identity manager application server 101. Theidentity management application server 101 may be tasked withprovisioning one or more of the user identities stored in the data store110 for use with accounts in one or more of the endpoint systems 111A,111B, . . . 111N. The use of a single identity for a given user acrossmultiple systems may ease tasks for administrators and users, forexample, by reducing complexity with respect to access monitoring andverification, by allowing an organization control over the granting ofexcessive privileges, and/or by allowing for the addition of servicesfor both internal users and by customers.

The universal connector server 115 is configured to enable communicationbetween the identity manager application server 101 and one or more ofthe endpoints 111A, 111B . . . 111N. However, rather than convertingrequests and data from a common format (e.g., used within a conventionalconnector) into the format expected by a target endpoint, the universalconnector server 115 is configured to infer or otherwise determine themessage structures, per operation type, used by one or more of theendpoints 111A, 111B, . . . 111N by observing and analyzing priorcommunications therewith. The universal connector server 115 can thusautomatically generate requests and data in accordance with the specificprotocol(s) or client libraries used within or otherwise understandableby the endpoints 111A, 111B . . . 111N, without prior/preprogrammedknowledge, additional coding, or otherwise independent of the particularcommunications protocols used by the endpoints 111A, 111B . . . 111N.

In the embodiment shown in FIG. 1, the universal connector server 115includes a virtual directory 112, a message monitor 125, a messageanalyzer 128, a message library 130, a prototype generator 150, and arequest generator 160. The virtual directory 112 provides an accesspoint for the identity management application server 101. For example,the virtual directory may be a lightweight abstraction layer thatresides between client applications and disparate types of identity-datarepositories, such as proprietary and standard directories, databases,web services, and applications. The virtual directory 112 receivesrequests Req_(in) from the identity manager application server 101 anddirects the requests Req_(in) to the request generator 160 byabstracting and virtualizing data. The virtual directory 112 mayintegrate identity data from multiple heterogeneous identity data stores110 and may present the identity data as though it were from one source.The virtual directory 112 may thus consolidate user identity and/orother data stored in a distributed computing environment.

The message library 130 stores a set of messages (including requestsand/or associated responses) sampled from prior communications with(i.e., to and/or from) the existing client(s) 105 and the targetendpoints 111A, 111B, . . . 111N by the message monitor 125. The messageanalyzer 128 clusters similar ones of the messages communicated with aparticular target endpoint 111A, 111B, . . . , or 111N into respectivegroups, where the similar messages in each group may correspond to asame operation type. The message analyzer 128 further determines amessage structure used by the particular target endpoint for one or moreof the operation types, based on commonalities among the messages ofeach group. The response generator 160 thereby generates a request tothe target endpoint for a particular operation type in accordance withthe determined message structure used by the particular target endpoint(for example, by populating a template created by the prototypegenerator 150), as described in detail below.

The environment 100 of FIG. 1 operates as follows. The existingclient(s) 105 are observed communicating with endpoint(s) 111A, 111B, .. . 111N via the message monitor 125, for example, in a pre-processingor training stage. The client(s) 105 and the endpoint(s) 111A, 111B, . .. 111N may communicate via a network 120 using a communications mode orprotocol, such as Lightweight Directory Access Protocol (LDAP) messagesor Simple Object Access Protocol (SOAP) messages, which may be conveyedusing Hypertext Transport Protocol (HTTP) with an Extensible MarkupLanguage (XML) serialization. The message monitor 125 may include orimplement a network monitoring tool, such as Wireshark®, and recordsmessages communicated between (i.e., to and/or from) the client(s) 105and the endpoint(s) 111A, 111B, . . . 111N. The message monitor 125stores these messages in the message library 130.

The messages communicated between the client(s) 105 and the endpoint(s)111A, 111B, . . . 111N may include requests only, request-responsepairs, or a sequence of requests/responses for a particular operation.The sampled request messages and/or response messages communicatedstored in the message library 130 may each contain two types ofinformation: (i) protocol structure information (such as the operationname), used to describe software services behaviors; and (ii) payloadinformation, used to express software systems' attributes. However, themessage monitor 125 may record the messages without knowledge of theprotocol, operation name, and/or attributes thereof.

For a given endpoint, a number of messages communicated thereto from theclient(s) 105 are recorded by the message monitor 125. The recordedmessages may thus include multiple messages of different operationtypes. As such, the collection of sampled messages communicated with aparticular endpoint 111A, 111B, . . . , or 111N may be indicative of thespecific communications protocol used by that endpoint. In particular,the repeated occurrence of information among sections of the sampledmessages may indicate that these sections correspond to protocolstructure for a particular operation type defined in the protocolspecification. In contrast, variability among sections of the sampledmessages may indicate that these sections include payload information,which is typically diverse according to various interaction operationparameter values.

Message analysis of the messages stored in the library 130 by themessage analyzer 128 may thus be used to distinguish protocol-relatedinformation (indicated by relatively constant sections of the messagestructure) from application-specific information (indicated by variablesections of the message structure) by comparing sections of similarmessages, without prior knowledge of the particular protocol used in themessage transaction with the target endpoint. In embodiments of thepresent disclosure, one or more algorithms may be used to classify largenumbers of stored messages (e.g., thousands or even millions) intogroups or clusters of similar messages, which may assist and/or reducethe processing burden in creating request templates (also referred toherein as request prototypes) for each observed operation type. At ahigh level, these algorithms can be viewed as an application ofclustering and sequence alignment techniques for inferring messagestructure information. For example, the message analyzer 128 may beconfigured to align the messages from the message library 130 in amanner suitable for comparison of characters, byte positions, n-grams,and/or other portions thereof, in order to group the messages byoperation type and determine messages structures for each operation typebased on commonalities among the messages of each group as describedherein.

The message library 130 thus provides historical message transactiondata that is used for training the universal connector server 115 tolearn the message structures of the operation types used by one or moreof the target endpoints 111A, 111B, . . . 111N. The training process maybe applied not only for each type of endpoint, but also for alternativeconfigurations of endpoints, particularly where the configuration isoutside the previously observed training data. As described below,requests directed to new endpoint types may thereby be generated inaccordance with the determined message structure used by a targetendpoint for a particular operation type, without the need foradditional coding and/or without prior knowledge of the protocols usedby the new endpoint types.

As shown in FIG. 1, operation of the universal connector server 115according to some embodiments is split into two stages, that is, thepre-processing or training stage and the run-time stage shown in FIG. 1.At the pre-processing stage, the message analyzer 128 is used topartition the message library 130 into groups or “clusters” of messagesof a same operation type, for instance, as specified by user input or asdetermined using a data clustering method. Any clustering method may beused, such as the Visual Assessment of cluster Tendency (VAT), BondEnergy Algorithm (BEA), K-Means, a hierarchical clustering algorithm,etc. The clustering method may include human supervision (such as inVAT) or may be fully automated in some embodiments. The similarity maybe determined by a distance function (such as the Needleman-Wunschsequence alignment algorithm). The distance function may be applied tocluster the messages based on request similarity, response similarity,or a combination of request and response similarities.

The message analyzer 128 is further configured to determine, for therespective operation types, respective message structures used by thetarget endpoint based on commonalities among the ones of the messages ofthe respective groups corresponding thereto. In particular, the messageanalyzer 128 may use a multiple sequence alignment algorithm to alignthe request messages of each group according to respective positionsthereof to identify constant and variable sections of the requestmessage structure for the operation type corresponding thereto, based ona frequency of occurrence of common characters at the respectivepositions of the aligned requests.

The prototype generator 150 is configured to generate request templates(referred to herein as request prototypes) for each of the messageclusters, including fields representing the constant and variablesections for each of the operation types. The fields of a requestprototype representing the constant sections of a request messagestructure may include common patterns or characters among the requestmessages of the corresponding cluster/group. For example, responsive tomultiple sequence alignment of the request messages of a group, theprototype generator 150 may select the most commonly occurringbyte/character/substrings at each position for inclusion in acorresponding constant section of the request prototype, provided thebyte/character/substring has a relative frequency above a predeterminedthreshold. The request prototype may thus include a particularbyte/character/substring at a particular position when there is aconsensus (among the requests of the corresponding cluster/group) as tothe commonality of the byte/character/substring at that position), whilepositions for which there is no consensus may be represented as a gap inthe request prototype. The fields of a request prototype representingthe variable sections of a request message structure may be populated bythe request generator 160 at run-time.

At run-time, responsive to receiving a request Req_(in) from theidentity manager application server 101 at the virtual directory 112,the request generator 160 is configured to synthesize or otherwisegenerate a request message Req_(out)A, Req_(out)B . . . , or Req_(out)Nto a corresponding one of the target endpoints 111A, 111B, . . . , or111N by selecting one of the request prototypes having the sameoperation type (e.g., Add, Delete, Modify, etc.) as the request Reg_(in)and substituting corresponding values from the request Req_(in) into oneor more of the variable fields of the selected request prototype. Therequest generation 160 may thereby generate a request Req_(out)A,Req_(out)B . . . , or Req_(out)N of the desired operation type in theformat expected by a particular one of the target endpoints 111A, 111B .. . or 111N, without additional coding or otherwise independent ofknowledge indicating the communications protocol(s) used by the targetendpoint.

It will be appreciated that in accordance with various embodiments ofthe present disclosure, the universal connector server 115 may beimplemented in a single server, separate servers, or a network ofservers (physical and/or virtual), which may be co-located in a serverfarm or located in different geographic regions. In other words, auniversal connector as described herein may be run as an on-premisecomponent, and/or in the cloud as software as a service (SaaS). The SaaSimplementation may offer a RESTful API to access the functionality ofthe universal connector for both the training phase and the run-timephase. The endpoints 111A-111N may likewise include a single server,separate servers, or a network of servers (physical and/or virtual). Assuch, one or more of the client(s) 105, the endpoints 111A-111N, and theuniversal connector 115 may be co-located or remotely located, andcommunicatively coupled by one or more networks 120. The network 120 maybe a global network, such as the Internet or other publicly accessiblenetwork. Various elements of the network 120 may be interconnected by awide area network (WAN), a local area network (LAN), an Intranet, and/orother private network, which may not be accessible by the generalpublic. Thus, the communication network 120 may represent a combinationof public and private networks or a virtual private network (VPN). Thenetwork 120 may be a wireless network, a wireline network, or may be acombination of both wireless and wireline networks. More generally,although FIG. 1 illustrates an example of a computing environment 100,it will be understood that embodiments of the present disclosure are notlimited to such a configuration, but are intended to encompass anyconfiguration capable of carrying out the operations described herein.

FIG. 2 illustrates an example computing device 200 in accordance withsome embodiments of the present disclosure. The device 200 may be used,for example, to implement the universal connector server 115 in thesystem 100 of FIG. 1 using hardware, software implemented with hardware,firmware, tangible computer-readable storage media having instructionsstored thereon, or a combination thereof, and may be implemented in oneor more computer systems or other processing systems. The computingdevice 200 may also be a virtualized instance of a computer. As such,the devices and methods described herein may be embodied in anycombination of hardware and software.

As shown in FIG. 2, the computing device 200 may include input device(s)205, such as a keyboard or keypad, a display 210, and a memory 212 thatcommunicates with one or more processors 220 (generally referred toherein as “a processor”). The computing device 200 may further include astorage system 225, a speaker 245, and I/O data port(s) 235 that alsocommunicate with the processor 220. The memory 212 may include acomputer readable program code implementing a universal connector 215stored therein. The universal connector 240 may be configured todetermine message structure(s) of a communications protocol or clientlibrary used by a target system/endpoint by analyzing messagescommunicated between the target system/endpoint and an existing clientand generate requests to the target system/endpoint according to thedetermined message structure(s), as described in detail herein.

The storage system 225 may include removable and/or fixed non-volatilememory devices (such as but not limited to a hard disk drive, flashmemory, and/or like devices that may store computer program instructionsand data on computer-readable media), volatile memory devices (such asbut not limited to random access memory), as well as virtual storage(such as but not limited to a RAM disk). The storage system 225 mayinclude a message library 230 storing data (including but not limited torequests and/or associated responses) communicated between a targetsystem/endpoint and an existing client. Although illustrated in separateblocks, the memory 212 and the storage system 225 may be implemented bya same storage medium in some embodiments. The input/output (I/O) dataport(s) 235 may include a communication interface and may be used totransfer information in the form of signals between the computing device200 and another computer system or a network (e.g., the Internet). Thecommunication interface may include a modem, a network interface (suchas an Ethernet card), a communications port, a PCMCIA slot and card, orthe like. These components may be conventional components, such as thoseused in many conventional computing devices, and their functionality,with respect to conventional operations, is generally known to thoseskilled in the art. Communication infrastructure between the componentsof FIG. 2 may include one or more device interconnection buses such asEthernet, Peripheral Component Interconnect (PCI), and the like.

In embodiments of the present disclosure, the universal connector 215 isconfigured to record and analyze observable messages communicatedbetween an existing client and a desired or target system/endpoint, andto infer information regarding the protocol specification and/or clientlibraries used by the endpoint, particularly in terms of the messagestructure(s) of the messages used in communication with the targetendpoint. In particular, the universal connector 215 may be configuredto pre-process the messages stored in the message library 230 to provideinsight into the constant and variable sections of the messagestructures (as well as encoding rules in some embodiments) for theprotocol(s) used by the target endpoint for each operation type. Forexample, message analysis may be used by the universal connector 215 tocluster or group the messages stored in the library 230 by operationtype, and to distinguish, for each operation type, the constant sectionsof the message structure (which may include protocol-relatedinformation) from the variable sections of the message structure (whichmay include application-specific or payload information) with no priorknowledge of the particular communications protocol used by the targetendpoint. In some embodiments, information regarding the operation typesand/or fields of the messages stored in the library 230 may be receivedor otherwise specified via user input for cluster/group generation.

In some embodiments, one or more distance functions may be used by theuniversal connector 215 to indirectly identify similar ones of thestored requests and/or responses (for cluster/group generation) and/orto identify common features among the requests in a cluster/group (forrequest prototype generation). One notion of similarity that may be usedin some embodiments of the present disclosure is the edit distancebetween two sequences, which indicates the minimum number ofmodifications (insertions, deletions, and/or substitutions) in order toobtain one sequence from the other. That is, the distance function maybe used to compute the number of modifications or alterations among thestored requests or responses (such that ones having similar or lowestrelative distances can be clustered in the same group), and/or tocompute the frequency of occurrence of common characters among therequests of a group at respective character positions (such thatfrequently occurring characters can be included in a request prototype).In some embodiments, different distance functions may be automaticallyselected for cluster/group generation and request prototype generation,for example, based on a particular notion of similarity.

Using the generated request prototypes, a translation function may beused by the universal connector 215 at run-time to generate orsynthesize a request of a specific operation type in accordance with thecommunications protocol expected by the target system/endpoint. As boththe protocol-related information (indicated by the constant sections)and the application-related information (i.e., the variable sections) ofthe message structure may be distinguished by the pre-processing and/ordistance calculations discussed above, the translation function may beconfigured to automatically generate requests to the target endpoint inthe expected format by filling-in contents of the variable section(s) ofa request prototype corresponding to a specified operation type usingfield substitution as described herein.

FIG. 3 illustrates a computing system or environment for providingconnector services for protocol-independent endpoint management inaccordance with further embodiments of the present disclosure. Inparticular, FIG. 3 illustrates a processor 320 and memory 312 that maybe used in computing devices or other data processing systems, such asthe computing device 200 of FIG. 2 and/or the universal connector server115 of FIG. 1. The processor 320 communicates with the memory 312 via anaddress/data bus 310. The processor 320 may be, for example, acommercially available or custom microprocessor, including, but notlimited to, digital signal processor (DSP), field programmable gatearray (FPGA), application specific integrated circuit (ASIC), andmulti-core processors. The memory 312 may be a local storage mediumrepresentative of the one or more memory devices containing software anddata in accordance with some embodiments of the present disclosure. Thememory 312 may include, but is not limited to, the following types ofdevices: cache, ROM, PROM, EPROM, EEPROM, flash, SRAM, and DRAM.

As shown in FIG. 3, the memory 312 may contain multiple categories ofsoftware and/or data installed therein, including (but not limited to)an operating system block 302 and a universal connector block 315. Theoperating system 302 generally controls the operation of the computingdevice or data processing system. In particular, the operating system302 may manage software and/or hardware resources and may coordinateexecution of programs by the processor 320, for example, in providingthe identity management operations illustrated in FIG. 1.

The universal connector block 315 is configured to carry out some or allof the functionality of the message analyzer 128, the prototypegenerator 150, and the request generator 160 of FIG. 1. In particular,the universal connector block 315 includes a message analysis function328, a prototype generation function 350, and a response generationfunction 360.

Responsive to accessing a message library (such as the message library130 of FIG. 1) including a set of messages (e.g., requests and/orassociated responses) communicated between a client (such as the client105 of FIG. 1) and a target endpoint (such as one of the endpoints 111A. . . 111N of FIG. 1), the message analysis function 328 groups or“clusters” similar ones of the messages to partition the message libraryinto groupings of messages having a same operation type for a particulartarget endpoint, a particular type of target endpoint, and/oralternative configurations thereof. The similarity can be determinedusing one or more distance measures or functions (such as theNeedleman-Wunsch sequence alignment algorithm), or may be determinedresponsive to receiving user input indicative of the operation types ofthe messages stored in the library.

Based on the groupings or clusters of messages, the message analysisfunction 328 further determines the message structures used by theparticular target endpoint for each of the operation types, based on thecommonalities among the ones of the messages of the respective groups.In particular, the message analysis function 328 may align correspondingsections or positions of the request messages of each group, and mayinfer or otherwise identify constant and variable sections of therequest message structure for each group of messages based on whether afrequency of occurrence of the characters at the corresponding sectionsor positions of the aligned requests exceeds a predefined threshold.

The prototype generation function 350 generates a request prototype foreach group of messages (and thus, each operation type in the embodimentof FIG. 3), where each request prototype includes fields representingthe constant and variable sections of the respective message structuresof the corresponding group of messages. The prototype generationfunction 350 may generate the request prototype for a particularoperation type to include the common characters of the requests in thecorresponding cluster/group in the fields representing the constantsections identified by the message analysis function 328, while thefields representing the variable sections may be populated at run-timeby the request generator 160. For example, the prototype generationfunction 350 may generate a request prototype to include a particularbyte, character, or substring of characters at particular constantfields thereof based on a frequency of occurrence in respectivepositions of the requests of each cluster/group, as indicated by themessage analysis function 328.

In some embodiments, the message analysis function 328 may use afrequency table to identify the byte/character/substring which occur atrespective positions in more than a predetermined percentage of therequests in a cluster/group. To calculate the frequency table, requestmessages in the cluster/group may be aligned, for example, using amultiple sequence alignment algorithm (such as CLUSTALW). Aftercompleting a multiple sequence alignment, the most commonly occurringbyte/character/substrings at each position may be selected for inclusionin a corresponding constant section of the request prototype by theprototype generation function 350, provided the byte/character/substringhas a relative frequency above a predetermined threshold. In otherwords, the prototype generation function 350 may generate a requestprototype to include a particular byte/character/substring at aparticular position when there is a consensus (among the requests of thecorresponding cluster/group) as to the commonality of thebyte/character/substring at that position). Positions for which there isno consensus may be left as a gap in the request prototype.

It will be understood that these and/or other operations of the messageanalysis function 328 and the prototype generation function 350 may beperformed as a pre-processing step, prior to any response generation.Also, as noted above, the distance function utilized for responseprototype generation may be the same as or different than the distancefunction used during cluster/group generation. However the distancefunction utilized for response prototype generation may only compare therequests of each group, while the distance function used duringcluster/group generation may compare information from the storedrequests and/or the responses in order to cluster the messages into therespective groups.

Still referring to FIG. 3, at run-time, the request generation function360 performs request generation by selecting one of the requestprototypes corresponding to a desired operation type, and populating oneor more of the variable fields of the selected request prototype. Forexample, the request generation function 360 may receive a request froma virtual directory (such as the virtual directory 112), may select oneof the request prototypes having the same operation type as the virtualdirectory's request, and may substitute corresponding values/attributesfrom the virtual directory's request into one or more of the variablefields of the selected request prototype. The request generationfunction 360 may thereby generate a request of the desired operationtype in the format expected by the target endpoint, independent ofreceiving data or other knowledge indicating the communicationsprotocol(s) used by the target endpoint.

Although FIG. 3 illustrates example hardware/software architectures thatmay be used in a device, such as the computing device 200 of FIG. 2, toprovide a universal connector in accordance with some embodimentsdescribed herein, it will be understood that the present disclosure isnot limited to such a configuration but is intended to encompass anyconfiguration capable of carrying out operations described herein.Moreover, the functionality of the computing device 200 of FIG. 2 andthe hardware/software architecture of FIG. 3 may be implemented as asingle processor system, a multi-processor system, a processing systemwith one or more cores, a distributed processing system, or even anetwork of stand-alone computer systems, in accordance with variousembodiments.

FIGS. 4A-4C are block diagrams illustrating operations performed by auniversal connector in accordance with embodiments of the presentdisclosure to create a message library, such as the message library 130of FIG. 1. As shown in FIGS. 4A-4C, the universal connector may beconfigured to support endpoint request operations including, but notlimited to, Establish Connection (e.g. Login), Add, Delete, Modify,Lookup, Search, and/or Disconnect (e.g. Logout). To support theseendpoint operations, a collection or database of previous messagescommunicated with the desired endpoint may be observed, recorded, andanalyzed to determine the message structure for each operation.

In particular, as shown in FIGS. 4A-4C, this can be achieved byrecording and storing traffic through a proxy mechanism, such as amessage monitor 425. The message monitor 425 observes and storesmessages for several operations transmitted from an existing client to adesired endpoint, and/or vice versa. For example, in FIG. 4A, theobserved requests include Add, Delete, Modify, Lookup, and Searchrequests communicated from existing client 405A to endpoint 411A. InFIG. 4B, the observed protocol may include only a single request fromexisting client 405B, and the response thereto from endpoint 411B. InFIG. 4C, the observed protocol may involve several requests andresponses, illustrated by way of example as an Add operation thatincludes separate message/responses communicated between existing client405C and endpoint 411C to create a user and set a password.

Message recording may continue in order to store enough messages (ofeach operation type) to determine a message structure of the desiredoperation type(s). Once a sufficient number/amount of messages for eachoperation type is recorded, the universal connector can learn theprotocol and/or otherwise determine the message structure for eachoperation type. Furthermore, based on a sufficient collection ofmessages, the universal connector can analyze and produce requesttemplates for each operation type during the pre-processing stage, whichcan be subsequently populated by the universal connector at run-time togenerate requests for each operation type. For example, for a particularendpoint several Add messages may be observed by the message monitor425, such as:

<Add>     <FirstName>John</FirstName>     <LastName>Smith</LastName></Add> <Add>     <FirstName>Jane</FirstName>    <LastName>Doe</LastName> </Add>

By analyzing the message segments that change and mapping those to knownattributes, the universal connector can generate a request template orprototype for the Add operation:

<Add>     <FirstName>${FirstName}</FirstName>    <LastName>${LastName}</LastName> </Add>While the above add request template or prototype is illustrated withreference to the XML format by way of example, embodiments of thepresent disclosure may similarly record and determine message structuresfor other textual formats (e.g., JSON) and/or binary formats (e.g.,ASN.1, proprietary formats) as well.

Operations for providing connector services for protocol-independentendpoint management in accordance with some embodiments of the presentdisclosure will now be described with reference to the flowcharts ofFIGS. 5 and 6A-6B. FIG. 5 is a flowchart illustrating operations fortraining a universal connector to determine the message structure of atarget endpoint in accordance with some embodiments of the presentdisclosure, while FIGS. 6A and 6B illustrate further operations fortraining a universal connector in accordance with embodiments of thepresent disclosure in greater detail. In some embodiments, theoperations of FIGS. 5 and 6A-6B may be performed by the universalconnector 115, 215, or 315 illustrated in FIGS. 1-3 or subcomponentsthereof.

Referring now to FIG. 5, messages communicated between a target endpointand an existing client thereof are recorded and stored in a computerreadable memory at block 500. For example, the messages may include addrequests, delete requests, and/or modify requests transmitted from aparticular client to a particular target endpoint. In some embodiments,the messages may also include responses returned by the target endpointresponsive to receiving the requests from the client. The recordingoperations of block 500 may continue for a duration sufficient to recordmessages of multiple operation types communicated between the client andthe target endpoint.

At block 505, ones of the messages previously communicated between thetarget endpoint and the existing client are clustered into respectivegroups. The respective groups may correspond to the operation types ofthe messages included therein, for the particular target endpoint ortype thereof. For example, where the recorded messages include addrequests, delete requests, and modify requests, the add requests may beclustered into a first group, the delete requests may be clustered intoa second group, and the modify requests may be clustered into a thirdgroup. The clustering may be performed responsive to receiving a userinput indicating the operation type of one or more of the messages, ormay be performed using a distance function to calculate similaritiesamong the recorded messages.

Based on commonalities among the messages of each group, a messagestructure used by the target endpoint for each of the operation typesmay be determined at block 510. For example, the add requests of thefirst group may be aligned according to respective positions thereof andanalyzed to identify constant and variable sections of a messagestructure for the add request operation type, based on a frequency ofoccurrence of common characters at the respective positions of thealigned add requests. In some embodiments, a request prototype includingfields representing the constant and variable sections may be createdfor each of the operation types. For instance, for the add requestoperation type, an add request prototype may be created, where thefields of the add request prototype representing the constant sectionsof the add message structure may include corresponding ones of thecommon characters indicated by the alignment of the add requests. Adelete request prototype and a modify request prototype may be similarlycreated based on alignment of the delete requests and modify requests ofthe second and third groups discussed above, respectively.

At block 515, a request to the target endpoint for a desired one of theoperation types may be generated according to the corresponding one ofthe message structures used by the target endpoint. In the aboveexample, when it is desired to add a user identity to the targetendpoint, the add request prototype may be selected from among the add,delete, and modify request prototypes, and the fields of the selectedadd request prototype corresponding to the variable sections of the addmessage structure may be populated with corresponding data from the useridentity to generate the add request to the target endpoint. Thegenerated request may thus be transmitted to the target endpoint,without knowledge or otherwise independent of the communicationsprotocol(s) used by the target endpoint.

As discussed above, for each type of endpoint (for example, OpenLDAP orMySQL), as well as for alternative configurations of an endpoint (e.g.,different schemas), the universal connector is trained to learn themessage structure of the endpoint. A goal of the training is to inferhow to translate fields from the virtual directory's schema to thetarget endpoint's message structure for each operation type. FIGS. 6Aand 6B illustrate two alternative techniques for training the universalconnector in accordance with embodiments of the present disclosure. Theflowchart of FIG. 6A relies on some human input, while the flowchart ofFIG. 6B is directed to unsupervised machine learning. In someembodiments, the training process may be applied not only for each typeof endpoint, but also for alternative configurations of endpoints,particularly where the configuration is outside the previously observedtraining data.

Referring now to FIG. 6A, a human supervised method for training theuniversal connector in accordance with embodiments of the presentdisclosure can obtain input from the user regarding operation typesand/or message field data sent during the message recording phase. Inparticular, after monitoring and recording messages communicated betweena target endpoint and an existing client at block 600, user inputspecifying operation types of the stored messages is received at block603. For example, in some embodiments, a user input tool may beprovided, which may prompt a user for information about each messagesent by the existing client during the message recording at block 600.For each message which is sent from the client, the user input tool mayprompt the user to specify the operation type of the message (e.g. Add,Delete, Modify) and/or the data contained in the message fields in thecontext of the virtual directory's schema. For instance, if the virtualdirectory is using SCIM (System for Cross-domain Identity Management) asits schema, a traffic recorder (such as the message monitor 125 ofFIG. 1) may observe the following message:

<Add>     <FirstName>John</FirstName>     <LastName>Smith</LastName></Add>The user input tool may prompt the user to specify that (i) the messageis of the “Add” operation type, (ii) the message contains the field“name: givenName” with the value “John”, and (iii) the message containsthe field “name: familyName” with the value “Smith”. The user input toolmay provide dropdown boxes, which can allow the user to map valueswithin the sent message onto the virtual directory's schema. The usermay also provide a sample of messages for each operation type with arange of values for each field.

Responsive to receiving the user input at block 603, the messages areclustered, based on the user-specified operation types, into respectivegroups corresponding to the respective operation types at block 605. Therecorded messages in each group can be analyzed to extract the messagestructure for each operation type, for instance, by mapping the userinput(s) onto the recorded messages sent by the client to the endpoint.In some embodiments, this may be accomplished by searching for the fieldvalues within the corresponding messages, and marking the bytes withinthe messages that correspond to the user input.

At block 609, request messages of each group are aligned according torespective positions or fields thereof. For example, a multiple sequencealignment algorithm (such as CLUSTALW) may be applied to the markedmessages for each operation type, with a constraint that the userspecified data fields within the messages are to be aligned. In someembodiments, this may be achieved by giving a relatively higherweighting to the alignment of these marked bytes.

Still referring to FIG. 6A, the format or structure of messages used bythe target endpoint may be extracted or otherwise determined, for eachoperation type, from the aligned message sequences of each group. Inparticular, at block 610, constant and variable sections of the messagestructure for each operation type are identified based on commonalitiesat respective positions of the aligned request messages of each group.In some embodiments, the constant and variable sections may be inferredbased on a frequency of occurrence of the commonalities at therespective positions of the requests of each group as indicated by thealigning at block 609, for example, using methods described in U.S.patent application Ser. No. 14/535,950 entitled “Systems and Methods forAutomatically Generating Message Prototypes for Accurate and EfficientOpaque Service Emulation,” filed Nov. 7, 2014, the disclosure of whichis incorporated by reference herein in its entirety. In particular,after completing a multiple sequence alignment, a consensus may becalculated by selecting, at each byte (or character) position, the mostcommonly occurring byte or character at that position, provided thebyte/character has a relative frequency of occurrence above apredetermined threshold. As such, at block 611, a request prototypeincluding fields representing the identified constant and variablesections is created for each operation type. Each request prototype mayinclude a particular byte/character at a particular position when thereis a consensus (based on the frequency of occurrence among the messagesof the corresponding group) as to the commonality of the byte/characterat that position. In some embodiments, positions for which there is noconsensus may be left as a gap in the corresponding request prototype.

At run-time, requests may be generated in a message format expected bythe target endpoint based on the determined message structures and/orgenerated request prototypes. In particular, responsive to receiving arequest from a virtual directory (such as the virtual directory 112 ofFIG. 1) for a desired operation type, one of the request prototypes thatcorresponds to the desired operation type is selected at block 613. Thefields of the selected request prototype that represent the variablesections of the corresponding message structure are populated with datafrom an external database to compose a request in accordance with amessage structure used by the target endpoint at block 615. For example,values from the request that was received from the virtual directory maybe substituted into corresponding positions of the request prototype forrelevant ones of the fields that represent the variable sections of thecorresponding message structure. The generated request is thustransmitted to the target endpoint at block 620.

Alternatively, referring to FIG. 6B, an unsupervised training method inaccordance with embodiments of the present disclosure may allow fortraining of the universal connector for communication with a targetendpoint without significant user input. In particular, at block 600, asample of messages exchanged between a client and the target endpoint ismonitored using a traffic recorder, such as the message monitor 125 ofFIG. 1, and is recorded in a database, such as the message library 130of FIG. 1. The recorded traffic contains a sample including multiplemessages of different operation types. In general, a larger number ofsamples stored in the message library may allow for more effectivetraining of the universal connector.

After recording and storing a sufficient set of sample messages at block600, similar ones of the messages are automatically clustered intorespective groups at block 606. A data clustering algorithm can be usedto automatically group similar messages. Examples of clusteringalgorithms which may be used include K-Means, Hierarchical Clustering, adensity based clustering algorithm, and/or a visual clustering algorithmsuch as VAT (Visual Assessment of cluster Tendency). Clusteringalgorithms may involve a distance function. An example of a distancefunction that may be used to cluster the messages is theNeedleman-Wunsch sequence alignment algorithm. The distance function maybe applied to only the request messages, only the response messages, orto pairs of request and response messages. Each group may thus includemessages of the same operation type, based on the computed similaritiesof the messages therein.

Having clustered the messages into groups corresponding to respectiveoperation types at block 606, common patterns for the request messagesin each cluster may be identified. In particular, at block 609, therequest messages of each group are aligned by respective positionsthereof. For instance, a multiple sequence alignment algorithm may beapplied to the request messages for each bluster, and format orstructure of the messages of each group may be extracted or otherwisedetermined from the aligned request messages. In particular, at block610, constant and variable sections of the message structure for eachoperation type are inferred or otherwise identified based oncommonalities at respective positions of the aligned request messages ofeach group. In some embodiments, the constant and variable sections maybe inferred based on a frequency of occurrence of the commonalities atthe respective positions of the requests of each group as indicated bythe aligning at block 609, for example, using methods described in U.S.patent application Ser. No. 14/535,950, as noted above.

At block 611, a request prototype including fields representing theidentified constant and variable sections is created for each group. Thefields of each request prototype representing the constant sections maybe populated based on the frequency of occurrence of bytes/characters atrespective positions of the requests of each group, while the fields ofeach request prototype representing the variable sections may beavailable for population with data from an external source, as discussedbelow.

Still referring to FIG. 6B, at block 612, a user input is received thatspecifies which cluster or group of messages corresponds to eachoperation type, thus indicating a correspondence between each requestprototype and each operation type. In some embodiments, the receiveduser input may further specify the mappings from the virtual directory'sschema to the request fields for each operation type. For example, foreach cluster or group, the received user input may specify not only theoperation type (e.g. Add, Delete, etc), but may also map or otherwiseindicate correspondence between fields of the virtual directory's schemato fields in the request. It will be understood that it may not benecessary to map all of the fields in the virtual directory's schema, asmapping of a subset of the fields may be sufficient in some embodiments.

In some embodiments, because the clusters/groups and/or the fieldsextracted from the recorded traffic may be unlabeled, automaticallygenerated labels may be presented via a user interface to assist theuser in the mapping process. For example, labels may be generated bynumbering the clusters and fields, e.g., a cluster may be labelled as“Operation 1”, “Operation 2”, etc., and fields may be labelled “Field1”, “Field 2”, etc. A more sophisticated labelling system may useheuristics to automatically label fields. The field length, thecharacter contents, the field ordering, and/or other features may beused to probabilistically predict which fields in each request prototypecorrespond to a particular type or category of field. For example, if afield is 5 characters long and contains numeric values, it may beinferred that this field is a US ZIP code. If the field containsalphabetic characters, then depending on the length, it may be predictedthat the field includes a given name or a family name.

However, it will be understood that receipt of the user input at block612 may not be necessary in some embodiments of the present disclosure.In particular, in some embodiments, the user input of block 612 may beomitted, and the operation types of each cluster may be automaticallyinferred, for example, using heuristics based on the previously recordedmessages.

At run-time, requests may be generated in a message format expected bythe target endpoint based on the determined message structures and/orgenerated request prototypes. In particular, responsive to receiving arequest from a virtual directory (such as the virtual directory 112 ofFIG. 1) for a desired operation type, a corresponding one of the requestprototypes is selected at block 613. The fields of the selected requestprototype that represent the variable sections of the correspondingmessage structure is populated with data from an external database tocompose a request in accordance with a message structure used by thetarget endpoint at block 615. For example, when a request to associate auser identity from a corporate store (such as the identity data store110 of FIG. 1) with an account of the target endpoint is received,values from the virtual directory's request may be substituted intocorresponding positions of the request prototype for relevant ones ofthe fields that represent the variable sections of the correspondingmessage structure. The generated request is thus transmitted to thetarget endpoint at block 620.

In some embodiments, the universal connector may include built-in (i.e.,preprogrammed) knowledge of various encoding formats that may be used bya target endpoint, for improved accuracy. Example encoding formatsinclude XML, JSON, ASN.1, etc. The user input received at block 603 or612 may specify the encoding format of the recorded message in someembodiments, while in some other embodiments the encoding format of therecorded messages may be automatically detected. During the trafficanalysis (training) phase, if the messages are encoded in a standardencoding format, then the messages may be initially decoded into acanonical format, before the remainder of the training phase isperformed. During the run-time phase, requests can be encoded into thetarget endpoint's encoding format before being sent to the targetendpoint at block 620. If the recorded messages are compressed orencrypted, decompression and/or decryption may be performed before thetraining phase. At the run-time phase, the generated requests may besimilarly compressed and/or encrypted to conform to the targetendpoint's specifications.

Aspects of the present disclosure have been described herein withreference to flowchart illustrations and/or block diagrams of methods,apparatuses (systems) and computer program products according toembodiments of the disclosure. It will be understood that each block ofthe flowchart illustrations and/or block diagrams, and combinations ofblocks in the flowchart illustrations and/or block diagrams, can beimplemented by computer program instructions. These computer programinstructions may be provided to a processor of a general purposecomputer, special purpose computer, or other programmable dataprocessing apparatus to produce a machine, such that the instructions,which execute via the processor of the computer or other programmableinstruction execution apparatus, create a mechanism for implementing thefunctions/acts specified in the flowchart and/or block diagram block orblocks. As used herein, “a processor” may refer to one or moreprocessors.

These computer program instructions may also be stored in a computerreadable medium that when executed can direct a computer, otherprogrammable data processing apparatus, or other devices to function ina particular manner, such that the instructions when stored in thecomputer readable medium produce an article of manufacture includinginstructions which when executed, cause a computer to implement thefunction/act specified in the flowchart and/or block diagram block orblocks. The computer program instructions may also be loaded onto acomputer, other programmable instruction execution apparatus, or otherdevices to cause a series of operational steps to be performed on thecomputer, other programmable apparatuses or other devices to produce acomputer implemented process such that the instructions which execute onthe computer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousaspects of the present disclosure. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. Although some of thediagrams include arrows on communication paths to show a primarydirection of communication, it is to be understood that communicationmay occur in the opposite direction to the depicted arrows. It will alsobe noted that each block of the block diagrams and/or flowchartillustration, and combinations of blocks in the block diagrams and/orflowchart illustration, can be implemented by special purposehardware-based systems that perform the specified functions or acts, orcombinations of special purpose hardware and computer instructions.

Computer program code for carrying out operations for aspects of thepresent disclosure may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB.NET,Python or the like, conventional procedural programming languages, suchas the “C” programming language, Visual Basic, Fortran 2003, Perl, COBOL2002, PHP, ABAP, dynamic programming languages such as Python, Ruby andGroovy, or other programming languages. The program code may executeentirely on the user's computer, partly on the user's computer, as astand-alone software package, partly on the user's computer and partlyon a remote computer or entirely on the remote computer or server. Inthe latter scenario, the remote computer may be connected to the user'scomputer through any type of network, including a local area network(LAN) or a wide area network (WAN), or the connection may be made to anexternal computer (for example, through the Internet using an InternetService Provider) or in a cloud computing environment or offered as aservice such as a Software as a Service (SaaS).

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting to otherembodiments. As used herein, the singular forms “a”, “an” and “the” areintended to include the plural forms as well, unless the context clearlyindicates otherwise. It will be further understood that the terms“comprises,” “comprising,” “includes” and/or “including”, “have” and/or“having” (and variants thereof) when used herein, specify the presenceof stated features, integers, steps, operations, elements, and/orcomponents, but do not preclude the presence or addition of one or moreother features, integers, steps, operations, elements, components,and/or groups thereof. In contrast, the term “consisting of” (andvariants thereof) when used in this specification, specifies the statedfeatures, integers, steps, operations, elements, and/or components, andprecludes additional features, integers, steps, operations, elementsand/or components. Elements described as being “to” perform functions,acts and/or operations may be configured to or otherwise structured todo so. As used herein, the term “and/or” includes any and allcombinations of one or more of the associated listed items and may beabbreviated as “/”.

It will be understood that, although the terms first, second, etc. maybe used herein to describe various elements, these elements should notbe limited by these terms. These terms are only used to distinguish oneelement from another. For example, a first element could be termed asecond element, and, similarly, a second element could be termed a firstelement, without departing from the scope of the various embodimentsdescribed herein.

Many different embodiments have been disclosed herein, in connectionwith the above description and the drawings. It will be understood thatit would be unduly repetitious and obfuscating to literally describe andillustrate every combination and subcombination of these embodiments.Accordingly, all embodiments can be combined in any way and/orcombination, and the present specification, including the drawings,shall support claims to any such combination or subcombination.

In the drawings and specification, there have been disclosed typicalembodiments and, although specific terms are employed, they are used ina generic and descriptive sense only and not for purposes of limitation,the scope of the disclosure being set forth in the following claims.

1. A computer-implemented method, comprising: recording, in acomputer-readable memory, a plurality of messages communicated between atarget endpoint and a client; clustering ones of the messages intorespective groups, wherein the respective groups correspond torespective operation types of the ones of the messages included therein;determining, for the respective operation types, respective messagestructures used by the target endpoint based on commonalities among theones of the messages of the respective groups corresponding thereto; andgenerating, for one of the respective operation types, a request to thetarget endpoint in accordance with a corresponding one of the respectivemessage structures used by the target endpoint, wherein the recording,the clustering, the determining, and the generating comprise operationsperformed by a processor.
 2. The method of claim 1, wherein determiningthe respective message structures for the respective operation typescomprises: identifying, for the respective operation types, constant andvariable sections of the respective message structures based on thecommonalities among the ones of the messages of the respective groupscorresponding thereto and independent of a communications protocol usedby the target endpoint.
 3. The method of claim 2, wherein the ones ofthe messages of the respective groups comprise requests to the targetendpoint, and wherein identifying the constant and variable sections ofthe respective message structures for the respective operation typesfurther comprises: aligning the requests of the respective groupsaccording to respective positions thereof; and identifying, for therespective operation types, the constant and variable sections of therespective message structures based on a frequency of occurrence of thecommonalities at the respective positions of the requests of therespective groups corresponding thereto indicated by the aligning. 4.The method of claim 3, wherein the commonalities comprise commoncharacters that are present at the respective positions of ones of therequests of the respective groups, and further comprising: creating, forthe respective operation types, respective request prototypes comprisingfields representing the constant and variable sections of the respectivemessage structures used by the target endpoint for the respectiveoperation types, wherein, for the respective request prototypes, thefields representing the constant sections include corresponding ones ofthe common characters.
 5. The method of claim 4, wherein generating therequest to the target endpoint for the one of the respective operationtypes comprises: selecting one of the request prototypes correspondingto the one of the operation types; and for the one of the requestprototypes, populating ones of the fields representing the variablesections with data from an external database to generate the request tothe target endpoint.
 6. The method of claim 5, wherein the datacorresponds to a user identity in a system employing a message structuredifferent from the respective message structures used by the targetendpoint, and wherein the request associates the user identity with anaccount of the target the endpoint.
 7. The method of claim 1, whereinthe clustering comprises: clustering the ones of the messages into therespective groups responsive to receiving user input indicating therespective operation types of the ones of the messages.
 8. The method ofclaim 1, wherein the clustering comprises: clustering the ones of themessages into the respective groups based on similarities therebetweencalculated using a distance function; and then classifying therespective groups as corresponding to the respective operation typesresponsive to receiving user input indicative thereof.
 9. A computersystem, comprising: a processor; and a memory coupled to the processor,the memory comprising computer readable program code embodied thereinthat, when executed by the processor, causes the processor to performoperations comprising: recording, in a computer-readable memory, aplurality of messages communicated between a target endpoint and aclient; clustering ones of the messages into respective groups, whereinthe respective groups correspond to respective operation types of theones of the messages included therein; determining, for the respectiveoperation types, respective message structures used by the targetendpoint based on commonalities among the ones of the messages of therespective groups corresponding thereto; and generating, for one of therespective operation types, a request to the target endpoint inaccordance with a corresponding one of the respective message structuresused by the target endpoint.
 10. The computer system of claim 9,wherein, in determining the respective message structures for therespective operation types, the computer readable program code furthercauses the processor to perform operations comprising: identifying, forthe respective operation types, constant and variable sections of therespective message structures based on the commonalities among the onesof the messages of the respective groups corresponding thereto andindependent of a communications protocol used by the target endpoint.11. The computer system of claim 10, wherein the ones of the messages ofthe respective groups comprise requests to the target endpoint, andwherein, in identifying the constant and variable sections of therespective message structures for the respective operation types, thecomputer readable program code further causes the processor to performoperations comprising: aligning the requests of the respective groupsaccording to respective positions thereof; and identifying, for therespective operation types, the constant and variable sections of therespective message structures based on a frequency of occurrence of thecommonalities at the respective positions of the requests of therespective groups corresponding thereto indicated by the aligning. 12.The computer system of claim 11, wherein the commonalities comprisecommon characters that are present at the respective positions of onesof the requests of the respective groups, and wherein the computerreadable program code further causes the processor to perform operationscomprising: creating, for the respective operation types, respectiverequest prototypes comprising fields representing the constant andvariable sections of the respective message structures used by thetarget endpoint for the respective operation types, wherein, for therespective request prototypes, the fields representing the constantsections include corresponding ones of the common characters.
 13. Thecomputer system of claim 12, wherein, in generating the request to thetarget endpoint for the one of the respective operation types, thecomputer readable program code further causes the processor to performoperations comprising: selecting one of the request prototypescorresponding to the one of the operation types; and for the one of therequest prototypes, populating ones of the fields representing thevariable sections with data from an external database to generate therequest to the target endpoint.
 14. The computer system of claim 13,wherein the data corresponds to a user identity in a system employing amessage structure different from the respective message structures usedby the target endpoint, and wherein the request associates the useridentity with an account of the target the endpoint.
 15. A computerprogram product, comprising: a computer readable storage medium havingcomputer readable program code embodied in the medium, the computerreadable program code comprising: computer readable code to record, in acomputer-readable memory, a plurality of messages communicated between atarget endpoint and a client; computer readable code to cluster ones ofthe messages into respective groups, wherein the respective groupscorrespond to respective operation types of the ones of the messagesincluded therein; computer readable code to determine, for therespective operation types, respective message structures used by thetarget endpoint based on commonalities among the ones of the messages ofthe respective groups corresponding thereto; and computer readable codeto generate, for one of the respective operation types, a request to thetarget endpoint in accordance with a corresponding one of the respectivemessage structures used by the target endpoint.
 16. The computer programproduct of claim 15, wherein the computer readable code to determine therespective message structures for the respective operation types furthercomprises: computer readable code to identify, for the respectiveoperation types, constant and variable sections of the respectivemessage structures based on the commonalities among the ones of themessages of the respective groups corresponding thereto and independentof a communications protocol used by the target endpoint.
 17. Thecomputer program product of claim 16, wherein the ones of the messagesof the respective groups comprise requests to the target endpoint, andwherein the computer readable code to identify the constant and variablesections of the respective message structures for the respectiveoperation types further comprises: computer readable code to align therequests of the respective groups according to respective positionsthereof; and computer readable code to identify, for the respectiveoperation types, the constant and variable sections of the respectivemessage structures based on a frequency of occurrence of thecommonalities at the respective positions of the requests of therespective groups corresponding thereto indicated by the aligning. 18.The computer program product of claim 17, wherein the commonalitiescomprise common characters that are present at the respective positionsof ones of the requests of the respective groups, and wherein thecomputer readable program code further comprises: computer readable codeto create, for the respective operation types, respective requestprototypes comprising fields representing the constant and variablesections of the respective message structures used by the targetendpoint for the respective operation types, wherein, for the respectiverequest prototypes, the fields representing the constant sectionsinclude corresponding ones of the common characters.
 19. The computerprogram product of claim 18, wherein the computer readable code togenerate the request to the target endpoint for the one of therespective operation types comprises: computer readable code to selectone of the request prototypes corresponding to the one of the operationtypes; and computer readable code to populate, for the one of therequest prototypes, ones of the fields representing the variablesections with data from an external database to generate the request tothe target endpoint.
 20. The computer program product of claim 19,wherein the data corresponds to a user identity in a system employing amessage structure different from the respective message structures usedby the target endpoint, and wherein the request associates the useridentity with an account of the target the endpoint.